dbCRAF: a curated knowledgebase for regulation of radiation response in human cancer

Abstract Radiation therapy (RT) is one of the primary treatment modalities of cancer, with 40–60% of cancer patients benefiting from RT during their treatment course. The intrinsic radiosensitivity or acquired radioresistance of tumor cells would affect the response to RT and clinical outcomes in patients. Thus, mining the regulatory mechanisms in tumor radiosensitivity or radioresistance that have been verified by biological experiments and computational analysis methods will enhance the overall understanding of RT. Here, we describe a comprehensive database dbCRAF (http://dbCRAF.xialab.info/) to document and annotate the factors (1,677 genes, 49 proteins and 612 radiosensitizers) linked with radiation response, including radiosensitivity, radioresistance in cancer cells and prognosis in cancer patients receiving RT. On the one hand, dbCRAF enables researchers to directly access knowledge for regulation of radiation response in human cancer buried in the vast literature. On the other hand, dbCRAF provides four flexible modules to analyze and visualize the functional relationship between these factors and clinical outcome, KEGG pathway and target genes. In conclusion, dbCRAF serves as a valuable resource for elucidating the regulatory mechanisms of radiation response in human cancers as well as for the improvement of RT options.


Introduction
As an intricate disease, cancer has greatly reduced human life expectancy worldwide ( 1 ,2 ).Radiation therapy (RT) is one of the prominent modalities for treating intricate diseases, and it can be used alone or in combination with other treatments to cure tumors and relieve tumor-related symptoms in cancer patients ( 3 ).However, the effects of RT are not specific to tumor cells and may cause side effects or toxicity to surround-ing exposed organs and tissues (e.g.RT injury, secondary cancer) (3)(4)(5).Therefore, it is important to examine the molecular mechanism of radiation response that enhances tumor killing while reducing toxicity to surrounding normal tissues, which can ensure better therapeutic outcomes and provide meaningful improvement for cancer patients (6)(7)(8).
Many researchers have dedicated their efforts to studying the molecular mechanisms of radiosensitivity, radioresistance in human tumor cell lines or prognosis in cancer patients receiving RT, but these valuable findings are scattered across multiple sources (9)(10)(11).To be specific, emerging evidence in the literature has connected more coding genes and various types of non-coding RNAs (ncRNAs) like microRNAs (miR-NAs), long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) to radioresistance and radiosensitivity in human cancers.For example, Han et al. demonstrated that downregulation of PVT1 inhibits radioresistance in nasopharyngeal carcinoma cells ( 12 ).Wang et al. found that circ_0067835 knockdown suppresses colorectal cancer (CRC) progression and strengthens CRC cell radiosensitivity ( 13 ).Additionally, Wen et al. constructed the dbCRSR database, which contains 395 coding genes, 119 miRNAs and 306 chemical compounds that can modulate radiosensitivity ( 14 ).Moreover, with the development of sequencing technology, there has been extensive literature using computational analysis methods to elucidate the complex relationship between genes and cancer prognosis with RT (15)(16)(17)(18)(19).These findings also provide valuable insights for improving the clinical outcomes of RT.To sum up, the lack of an effective collection of these scattered data makes it difficult to obtain an overarching understanding of prior research on the molecular mechanisms of radiation response, thus hampering the further progress of RT treatment.
To meet these needs, we developed an informative and functional database (dbCRAF , http://dbCRAF .xialab.info/), which contains various factors (1,677 genes, 612 radiosensitizers and 49 proteins) associated with radiosensitivity, radioresistance in cancer cells and prognosis in cancer patients receiving RT.In addition, we collated miRNA-target, lncRNA-target, circRNA-target and drug-target interactions for ncRNAs and radiosensitizers associated with radiation response, allowing users to construct radiation response interaction networks.Furthermore, four analytics modules were designed to enable users to extract key factors or potential modulators affecting radiation response based on the collected data.In brief, dbCRAF aspires to be a one-stop online service for integrating and analyzing valuable information on cellular radiation response.

Materials and methods
This part introduces data collection and curation, interaction data processing, data analysis and visualization, and database design and web interface implementation for the dbCRAF database.

Data collection and curation
To conduct a systematic review of the literature, we first searched PubMed articles published between January 2000 to February 2022 using a list of keywords ( Supplementary Table S1 ) related to radiosensitivity, radioresistance and RT prognosis prediction.Second, we performed full-text screening to assess the eligibility of papers and extracted key information from the relevant articles using PubTator ( 20 ), including publication information, the factor (genes, proteins, radiosensitizers) that could regulate radiation response, cancer type, the regulation mode of each factor, gene expression fold change, the cytotoxicity of radiosensitizers, the cell line name, the type of model organism, the type of ray, the cancer tissue source, sample size, collection time, the type of radiotherapy and survival endpoint.Notably, we defined each evi-dence that linked a factor to radiation response in cancer as an association.Third, to guarantee consistent names for all obtained factors and related cancers, we unified the gene names using the gene symbol-alias conversion table from the HGNC database (2022.4.12 version) ( 21 ), and radiosensitizer using 'Generic Name' from the DrugBank database ( 22 ).Fourth, we harmonized cancer names with disease terms in the MeSH database (2022.8.8 version) according to the occurrence location of the tumor ( 23 ).Fifth, we retained the original content of the type of RT, cancer tissue source, the type of ray and survival endpoint due to the presence of multiple descriptions in many publications.This decision enables the database to provide a more comprehensive representation of the data.Lastly, we added the subcellular location annotation from the GeneCards database, exploring the relationship between different subcellular structures and gene products related to radiation response ( 24 ).
All the above information has been processed using python or R scripts, and manually verified by at least two individuals to ensure its accuracy.The scripts employed for processing could be downloaded in dbCRAF.

Interaction data processing
To enable users to mine more information based on the associations collected in dbCRAF and to comprehend the mechanisms underlying radiation response regulation, we gathered four kinds of interaction data and target genes of miRNAs, lncRNAs, circRNAs, and radiosensitizers related to radiation response, as shown in Supplementary Table S2 .We first collected the miRNA-target interactions of Homo sapiens with the support type of 'Functional miRNA-target interactions' from the miRTarBase database ( 25 ).Then, we obtained the lncRNA-target interactions in human that were confirmed by low-throughput methods such as RT-qPCR and western blot from LncRNA2Target database ( 26 ).Next, we organized the intersection of potential circRNA-target interactions from the CircInteractome ( 27 ) and Circbank ( 28 ) databases to improve confidence of these results derived from related predictive tools.Moreover, radiosensitizer (drug)-target interactions were retrieved from the DrugBank, PharmGKB ( 29 ) and CMap ( 30 ) databases.Finally, we also integrated proteinprotein interaction (PPI) data for the target genes and the genes recorded in dbCRAF based on the STRING database ( 31 ) to infer the overall regulatory network.

Data analysis and visualization
To further explore the association data and four kinds of interaction data, we constructed four analysis blocks equipped with visual aids such as Word Cloud graphs, Kaplan-Meier survival analysis, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis ( 32 ) and interaction network visualization.First, Word Cloud allows users to count and visualize the associations between factors and cancer recorded in dbCRAF.The size of each factor's representation in the Word Cloud is proportional to the number of associations it has.This visual representation helps users identify factors that have a higher number of associations.Second, we collected the RNA-seq and clinical data from the Cancer Genome Atlas (TCGA) database ( 33 ), and investigated whether the expression levels of miRNAs or mRNAs recorded in dbCRAF affected the clinical prognosis of samples receiving RT at different tumor stages (stages I-IV) in one or more types of cancer.The results are presented using Kaplan-Meier survival curves, enabling users to assess the impact of these factors on patient outcomes.Third, KEGG pathway enrichment analysis supports users to annotate genes related to radiation response according to the type of pathway involved, and the results are presented in the form of bubble chart.This visualization helps users understand the functional roles of the identified factors by displaying enriched pathways.Finally, interaction network visualization enables users to describe the interaction relationships of factors in dbCRAF, thus facilitating the screening of key genes and potential modulators of radiation response.

Database content summary
After excluding duplicate articles with the same PubMed ID or title, we retained > 10,000 records.Particularly, we classifies those literature about exploring the molecular mechanism of radiosensitivity and radioresistance through over-expression or knock-down assays as 'Experiment', and labels those about predicting radiation response and RT prognosis of cancer patients through computational analysis as 'Prediction'.We then screened the full-text records to assess their relevance and extracted key information.dbCRAF currently contains 4,448 association entries between factors and cancer from 2,670 publications and it covers 1,677 genes, 612 radiosensitizers, and 49 proteins linked with 30 types of cancer.
In comparison to the dbCRSR database, the dbCRAF database encompassed literature with a longer time span, as indicated in Table 1 .Additionally, dbCRAF provided information on multiple mechanisms of radiation response, resulting in a greater number of associations between factors and radiation response in cancer being recorded.Moreover, dbCRAF included descriptions of the various modes through which each factor modulated the radiation response in cancer.These modes included the addition of radiosensitizers, occurrence of gene mutation, changes in gene expression level, gene methylation status or protein expression.In summary, the information contained in dbCRAF is more comprehensive.
In addition, we made three findings by analyzing the 4,448 associations recorded in dbCRAF.First, the number of studies that focused on molecular mechanisms of radiation response showed an increasing trend year by year (Figure 1 A).This indicates the growing interest in understanding the molecu- lar mechanisms of radiation response and its potential applications in cancer treatment.Meanwhile, recent studies published since 2018 have integrated the analysis of large genomic or transcriptome datasets with biological experiments to efficiently screen and verify genes of interest.This approach eliminated the limitations imposed by prior knowledge or assumptions and provided stronger evidence for identifying potential new associations between factors and cancer.Second, more than half of the associations were related to head and neck neoplasms, lung neoplasms, and breast neoplasms (Figure 1 B), which might be due to their higher incidence or the wider use of RT options and thus leading to more relevant studies ( 1 ,34-36 ).Third, the largest number of factors involved in regulating cancer radiation response were genes, followed by radiosensitizers (Figure 1 C).Lastly, we separately counted the five categories of genes and found that in addition to protein-coding genes, there has been an increasing amount of ncRNAs documented in the radiation response-related literature ( 37 ,38 ).

Hot protein-coding gene and enriched pathways
After counting 2,658 associations between coding gene and cancer, we found a total of 1,308 coding genes.Among them, HIF1A , EGFR , and ATM were the most extensively studied coding genes, and their expression levels could influence the response of multiple tumors to radiation (Figure 2 A).Next, we grouped all coding genes (793 versus 666) collected in 'Experiment' and 'Prediction' articles respectively, and obtained 151 genes shared between them (Figure 2 B).This indicates that both biological experiments and computational methods have identified these 151 genes as potential genetic biomarkers for prognosis, radiosensitivity or radioresistance in human cancer.This finding suggests the possibility of optimizing RT effects based on a patient's genomic profiles and intrinsic radiosensitivity ( 39 ).Additionally, we analyzed 515 coding genes related to cancer prognosis identified in the 'Prediction' articles but not in the 'Experiment' articles to uncover more potential genetic biomarkers.Specifically, we performed the KEGG pathway enrichment analysis on the two groups of coding genes (793 versus 515) with or without experimental evidence of regulating radiation response.The results showed that four pathways were the same in both groups of genes, including IL-17 signaling pathway, neurodegeneration-multiple diseases pathways, pathways in cancer, and proteoglycans in cancer (Figure 2 C).In summary, 137 of the 515 coding genes are involved in one or more of these four pathways ( Supplementary Table S3 ), and it is likely that they regulate radiosensitivity, radioresistance or prognosis in cancer through their involvement in these functional pathways.

Noncoding RNA, radiosensitizer and their target genes analysis
Non-coding RNAs and radiosensitizers typically exert regulatory function in radiation response via interacting with target genes.Therefore, we collected and analyzed interaction data of ncRNAs and radiosensitizers related to radiation response from various databases ( Supplementary Table S2 ).First, we counted the most studied miRNAs, lncRNAs and radiosensitizers based on association data documented in dbCRAF.As shown in Figure 3 A, the top four ncRNAs and radiosensitizers were all involved in modulating radiation response in one or more cancers.Notably, there were only 31 associations between circRNA and cancer, making it difficult to sort circRNAs according to the study frequency.Second, we integrated all target genes and interaction data of ncRNAs and radiosensitizers linked with radiation response, including miRNA-target interactions, lncRNA-target interactions, circRNA-target interactions and radiosensitizer-target interactions.Interestingly, almost 25% of the target genes have been documented to be associated with radiation response (Figure 3 B).Third, we counted the top 20 target genes that appeared most frequently in the interaction data and collected evidence for their regulation of radiation response.We hypothesized that target genes that appear more frequently in the interaction data are more likely to regulate radiosensitivity or radioresistance.Coincidentally, 19 of the top 20 target genes have been experimentally confirmed to regulate the radiation response ( Supplementary Figure S1 ), with only the FOXO1 gene lacking study evidence.
Therefore, we combined miRNA-target ( FOXO1 ) interactions and relevant associations between miRNA and cancer recorded in dbCRAF to map FOXO1 -miRNA-cancer associations.The results showed that FOXO1 may be associated with radiation responses in lung neoplasms and head and neck neoplasms (Figure 3 C).Although there is no direct evidence suggesting that FOXO1 regulates the radiation response of cancer cells, FOXO1 is still an important factor in cancer processes based on its inclusion in the oncogene and tumor suppressor gene list from the Cancer Gene Census project ( 40 ).Additionally, Zhou et al. found that TCF19 could promote cell proliferation in non-small cell lung cancer by inhibiting FOXO1 ( 41 ).Gong et al. also showed that the AKBA gene inhibited radioresistance in lung cancer through the regulation of the AKT / FOXO1 / p21 axis ( 42 ).These findings support our conclusion that FOXO1 may be closely related to the occurrence and prognosis of RT in lung cancer.

Key dbCRAF modules and user cases
dbCRAF provides a comprehensive collection of high-quality data, including association data between factors and radiation response in cancer as well as interaction data between factors and target genes.All data are freely available through the Download module, and users can browse and analyze the data using the Data-Matrix, Search, and Analysis modules on  the dbCRAF website ( http:// dbCRAF.xialab.info/), as shown in Figure 4 .
The 'Data-Matrix' page is an interactive and digitized table organized by factors (rows) associated with radiation response and cancer types (columns), allowing users to quickly navigate all association entries.
For the Search page, dbCRAF provides user-friendly search options supporting automatic retrieval of association data recorded in dbCRAF under multiple cancer types.In addition, users can query the related data with different evidence in three ways: 'By All', 'By Experiment' and 'By Prediction'.Furthermore, users can click the corresponding entry to view all detailed information, such as key sentences and biological experiment settings.
On the Analysis page, dbCRAF supports four practical analysis functions to answer common biological questions.Users can perform word cloud, KEGG pathway enrichment analysis, and interaction network visualization using a list of factors related to radiation response along with their interaction data.Kaplan-Meier survival analysis is implemented with the miRNA-seq, mRNA-seq and clinical data of cancer patients treated with RT in the TCGA database.

Discussion
To the best of our knowledge, dbCRAF is the first database to furnish association data between factor and human cancer related to multiple radiation response regulatory mechanisms, including radiosensitivity, radioresistance, and prognosis of RT in cancer patients.At the same time, we gathered the interaction data concerning ncRNAs and radiosensitizers linked to radiation response.On top of that, dbCRAF allows users to analyze the above information, and provides an all-in-one service for the retrieval, analysis, and visualization of human cancer radiation response factors related to human cancer.
For this study, we ran statistical analyses of association data retrieved from the PubMed literature, interaction data and target genes derived from multiple databases, respectively, to provide more guidance for researchers.Moreover, we found that out of the 666 coding genes related to tumor RT prognosis, a total of 151 genes had been validated by over-expression or knock-down assays.These assays have demonstrated their potential influence on the radiosensitivity or radioresistance of cancer cell lines.Therefore, it is essential to perform biological experiments on the remaining 515 genes to investigate their regulatory mechanism of response to radiation in human cancer cells.Additionally, regarding the data of miRNAtarget interactions, we observed that hsa-miR-195-5p interacted with BCL2 , the target gene with the most interaction data ( Supplementary Figure S1 ).Interestingly, earlier studies revealed that hsa-miR-195-5p could enhance the radiosensitivity of breast cancer cells by repressing the expression of BCL2 ( 43 ).Consequently, it is feasible to discover more potential biomarkers of radiation response by utilizing both information including the associations between factor and cancer in dbCRAF and corresponding interactions between factor and target gene.
This work also has several limitations that require further exploration.On the one hand, the current database lacks information on radiation toxicity, the change of radiosensitivity, treatment options and machines.It is necessary to explore and include this additional information from relevant literature or databases in the future to enhance the understanding of radiation response.On the other hand, we observed that more than half of the associations are linked to head and neck neoplasms, lung neoplasms and breast neoplasms.This disproportionate distribution may introduce bias in the database and the biological information it contains.To address this issue, we are considering implementing mechanisms such as weighting, where cancer types with fewer associations are given greater weight when inferring potential associations between genes and radiation response in cancer.Furthermore, it is crucial to supplement associations of cancer types with limited data by incorporating information from additional sources like Web of Science and Scopus.These efforts will contribute to a more comprehensive representation of radiation re-sponse and help mitigate potential bias across various types of cancer.
In conclusion, with the collated data and the implementation of four online analysis tools, dbCRAF is not only an integrated resource, but also a web server for mining the potential molecular mechanisms of radiation response.The data and the functions of dbCRAF may be provide new perspectives on the development of RT therapeutics for human cancer.

Figure 1 .
Figure 1.Summary of dbCRAF contents.( A ) Statistics of publication information.( B ) Statistics of association data with each cancer type.( C ) Statistics of association data with each kind of factor.

Figure 2 .
Figure 2. Ov ervie w of protein-coding gene-cancer associations.( A ) Word cloud of all protein-coding genes, where there is more literature e vidence f or genes are displa y ed using a larger font size.( B ) Number of coding genes derived from Experiment or Prediction publications.( C ) The top 20 KEGG enrichment pathw a y s of coding genes with e xperimental e vidence (793 genes) or only computational analy sis results (515 genes).

Figure 3 .
Figure 3. Statistics related to non-coding RNAs, radiosensitizers and their target interactions.( A ) Statistics of the associations about top four miRNAs, lncRNAs and radiosensitizers, where more frequent associations are displayed using a thicker line.( B ) Venn diagram of genes recorded in dbCRAF and all target genes.( C ) Sank e y diagram of FO X O1-miRNA-cancer associations.

Table 1 .
The statistic and comparison of dbCRSR and dbCRAF databases a Radiosensitivity.b Radioresistance.